214 research outputs found
High-dimensional variable selection
This paper explores the following question: what kind of statistical
guarantees can be given when doing variable selection in high-dimensional
models? In particular, we look at the error rates and power of some multi-stage
regression methods. In the first stage we fit a set of candidate models. In the
second stage we select one model by cross-validation. In the third stage we use
hypothesis testing to eliminate some variables. We refer to the first two
stages as "screening" and the last stage as "cleaning." We consider three
screening methods: the lasso, marginal regression, and forward stepwise
regression. Our method gives consistent variable selection under certain
conditions.Comment: Published in at http://dx.doi.org/10.1214/08-AOS646 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Genome-Wide Significance Levels and Weighted Hypothesis Testing
Genetic investigations often involve the testing of vast numbers of related
hypotheses simultaneously. To control the overall error rate, a substantial
penalty is required, making it difficult to detect signals of moderate
strength. To improve the power in this setting, a number of authors have
considered using weighted -values, with the motivation often based upon the
scientific plausibility of the hypotheses. We review this literature, derive
optimal weights and show that the power is remarkably robust to
misspecification of these weights. We consider two methods for choosing weights
in practice. The first, external weighting, is based on prior information. The
second, estimated weighting, uses the data to choose weights.Comment: Published in at http://dx.doi.org/10.1214/09-STS289 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
A challenging problem in estimating high-dimensional graphical models is to
choose the regularization parameter in a data-dependent way. The standard
techniques include -fold cross-validation (-CV), Akaike information
criterion (AIC), and Bayesian information criterion (BIC). Though these methods
work well for low-dimensional problems, they are not suitable in high
dimensional settings. In this paper, we present StARS: a new stability-based
method for choosing the regularization parameter in high dimensional inference
for undirected graphs. The method has a clear interpretation: we use the least
amount of regularization that simultaneously makes a graph sparse and
replicable under random sampling. This interpretation requires essentially no
conditions. Under mild conditions, we show that StARS is partially sparsistent
in terms of graph estimation: i.e. with high probability, all the true edges
will be included in the selected model even when the graph size diverges with
the sample size. Empirically, the performance of StARS is compared with the
state-of-the-art model selection procedures, including -CV, AIC, and BIC, on
both synthetic data and a real microarray dataset. StARS outperforms all these
competing procedures
Structured, sparse regression with application to HIV drug resistance
We introduce a new version of forward stepwise regression. Our modification
finds solutions to regression problems where the selected predictors appear in
a structured pattern, with respect to a predefined distance measure over the
candidate predictors. Our method is motivated by the problem of predicting
HIV-1 drug resistance from protein sequences. We find that our method improves
the interpretability of drug resistance while producing comparable predictive
accuracy to standard methods. We also demonstrate our method in a simulation
study and present some theoretical results and connections.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS428 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …